This is an R Markdown Notebook. When you execute code within the notebook, the results appear beneath the code.
Try executing this chunk by clicking the Run button within the chunk or by placing your cursor inside it and pressing Cmd+Shift+Enter.
Time Series Forecasting :- A time series is a list of numbers along with the information about at what time the information was recorded.
# defining a time series
y = ts(c(123,74,88,34), start = 2012)
y
Time Series:
Start = 2012
End = 2015
Frequency = 1
[1] 123 74 88 34
Frequency :- number of observations before the seasonal pattern repeats.
# defining a time series with frequency more than 1
y = ts(runif(36,50,100), start = 2012, frequency = 12)
# Frequency = 12 -> Monthly data
# Frequency = 4 -> Quarterly data
# Frequency = 52 -> Weekly data
# Frequency = 1 -> Annual data
y
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
2012 97.21065 69.64123 68.16275 65.72687 64.88251 62.31411 97.37136 94.13365 64.48506 78.10728 75.87194 94.54482
2013 53.74235 73.87685 69.13449 63.42162 61.35169 94.37257 97.49104 53.18571 53.58086 97.75685 83.49069 70.30487
2014 82.72169 66.05735 95.78260 74.94265 99.25124 98.67068 73.15769 92.99046 63.33119 85.21422 81.67910 56.29820
Time Plots :- For time series data, most useful graph to start with is a time plot. We will now look at a plot from Weekly Economy passenger load on Ansett Airlines “melsyd”
autoplot command helps in plotting time series data from Economy class variable.
autoplot(melsyd[,"Economy.Class"]) +
ggtitle("Economy class passanger: Melbourne - Sydney") +
xlab("Year") +
ylab("Thousands")
Lernings :- 1. There was a period in 1989 when no passanger was carried. 2. There was a period in 1992 when the load was low. 3. There was a increase in load in 2nd half of 1991. 4. There is a dip in load at the start of each year. 5. There is a long term fluctuation in level of the series which increases in 1987, decrease in 1989 and then increase in 1991.
Now let’s look at a simpler time series Monthly Scripts of pharmaceutical products
autoplot(a10) +
ggtitle("Antidiabetic drug sale") +
ylab("$ million") +
xlab("Year")
Learning:- 1. There is an increasing trend. 2. Seasonal patterns that increase in size as the level increases.
Time Series Patterns :- 1. Trend :- Trend are the long term increase or decrease in the data. Trend might also change direction within the time series which means it goes from increasing trend to decreasing trend. 2. Seasonal :- A seasonal pattern occurs when a time series is affected by seasonal factors i.e. time of the year, day of the week. Seasonality has a fixed and known frequency. 3. Cyclic :- A cycle occurs when the data exhibits rises and fall that are not of a fixed frequency.
The length of cycles are more than length of seasons usually and the magnitude of cycles tend to be more variable than magnitude of seasonal patterns.
par(mfrow = c(2,2))
plot(elec, xlab = "year", ylab = "Australian Monthly electricity production")
plot(ustreas, xlab = "time", ylab = "US treasury bill contracts")
plot(hsales, xlab = "year", ylab = "Monthly Housing Sales (millions)")
plot(elecequip, xlab = "year", ylab = "Monthly Electrical Equipment Production")
Observations :- 1. Electricity production data has strong increasing trend and a seasonal pattern with annual frequency that increase with the level. 2. US treasury bill data has strong decreasing trend. 3. Monthly household sales data shows no trend but seasonality within each year. Also, a cyclic pattern is visible that last for 6-10 years. 4. Monthly household sales data shows no trend but seasonality within each year. Also, a cyclic pattern is visible that last for 6-8 years.
Seasonal Plots ####################################################################################################
Seasonal Plots are similar to time plots except that the data are plotted against the individual seasons.
Seasonal plots for the antidiabetic drugs sales
ggseasonplot(a10, year.labels = TRUE, year.labels.left = TRUE) +
ggtitle("Seasonal plot: antidiabetic drug sales") +
ylab("$ million")
Seasonal plots helps in looking at seasonality more clearly, and is useful in identifying the years in which pattern has changed.
Learning from above plot:- 1. A big jump in sales in month of January. 2. March 2008 sales has negative slope opposite of other years. 3. Sales pattern in month of april is not consistent.
We can also use polar coordinates for the above plot
ggseasonplot(a10, polar = TRUE) +
ggtitle("Seasonal plot: antidiabetic drug sales") +
ylab("$ million")
Seasonal Subseries Plots #################################################################################################### These charts represents data for each season in seperate mini time plots.
ggsubseriesplot(a10) +
ylab("$ million") +
ggtitle("Seasonal Subseries plot: Antidiabetic Drug Sales")
The horizontal lines represents the average for each month. This plot can pe used in identifying changes within a season.
Scatter Plots #################################################################################################### Timeplots can be used to study relationships between time series.
autoplot(elecdemand[,c("Demand","Temperature")],facets = TRUE) +
xlab("Year : 2014") + ylab("") +
ggtitle("Half-hourly electricity demand: Victoria, Australia")
We can study the relationship between demand and temperature by plotting one series against the other.
qplot(Temperature, Demand, data = as.data.frame(elecdemand)) +
xlab("Temperature (Celsius)") + ylab("Demand (GW)")
It can be seen that demand is high when temprature is either very high or very low.
Correlation #################################################################################################### Correlation coefficient measure the strength of linear relationship between 2 variables. It lies between -1 to 1. A negative value indicated a negative relationship and a positive value indicates a positive relationship.
cor(elecdemand[,c("Demand","Temperature")])
Demand Temperature
Demand 1.0000000 0.2798072
Temperature 0.2798072 1.0000000
The above matrix shows that correlation between demand and temperature is 0.23(low correlation) but the scatter plot above showed that there is a strong non linear correlation between demand and temprature. As it is non linear, it cannot be measured by correlation coefficient.
Quarterly visitor numbers for five regions of New South Wales, Australia
autoplot(visnights[,1:5], facets = TRUE) +
ylab("Number of visitors nights each quarter (millions)")
Quarterly visitor numbers for five regions of New South Wales, Australia
We can plot these 5 time series, we can plot time series against each other. These plots can be arranged in a scatterplot matrix.
GGally::ggpairs(as.data.frame(visnights[,1:5]))
Observations from above plot:- 1. Strong correlation between visitors of NSW south coast and north coast. 2. Strong correlation between visitors of NSW south coast and metropolitan. 3. Strong correlation between visitors of NSW north coast and metropolitan.
Lagplots ####################################################################################################
Lagplots shows y(t) plotted against y(t-k) for different values of k.
beer2 = window(ausbeer, start = 1992)
gglagplot(beer2)
Autocorrelation #################################################################################################### Autocorrelation measures the linear relationship between the lagged values of a time series.
Autocorrelation is calculated as :-
\[r_k = \frac{ \sum_{t=k+1}^T (y_t - mean(y))(y_{t-k} - mean(y))}{\sum_{t=1}^T(y_t - mean(y))^2}\]
Autocorrelation for 9 lags corresponding to nine scatterplots above are :-
acf(beer2, lag.max = 9,plot = FALSE)
Autocorrelations of series ‘beer2’, by lag
0.00 0.25 0.50 0.75 1.00 1.25 1.50 1.75 2.00 2.25
1.000 -0.102 -0.657 -0.060 0.869 -0.089 -0.635 -0.054 0.832 -0.108
The plot for autocorrelation is also called correlogram.
ggAcf(beer2)
The dashed blue indicates where the correlation is significantly different from zero.
Observation :- 1. r4 is highest than other lags. This is due to the seasonal pattern in the data. Peaks and troughs are 4 quater apart. 2. r2 is more negative than all the other lags because troughs tend to be 2 quarters behind peaks.
When the data is seasonal, the autocorrelation at seasonal lags (and at multiple of seasonal frequency) will be larger than other lags. Also, the data with trend will show stronger autocorrelation in nearby time lags and will decrease with time because values nearby in time also nearby in size. When data have both, we will see a combinaion of both.
Let’s look at a dataset that has both trend and seasonality, Monthly Australian Electricity Demand
aelec = window(elec, start = 1980)
autoplot(aelec) + xlab("Year") + ylab("GWh") + ggtitle("Monthly Australian Electricity Demand")
ggAcf(aelec, lag = 48)
The slow decrease in the ACF as the lags increase is due to the trend, while the “scalloped” shape is due the seasonality.
White Noise :- The time series that shows no autocorrelation are called white noise.
An example of white noise
set.seed(30)
y = ts(rnorm(50))
autoplot(y) + ggtitle("White noise")
ggAcf(y)
In white noise, we expect autocorrelation to be close to zero. In the above graph, we can see that all the values are below blue dashed lines.
95% of the spikes in ACF should lie within \(\pm 2 \sqrt{T}\) where T is the length of the time. If we have more than 5% spikes outside this range than the time series might not be a white noise.
Excercises
Using help function we will look at following 3 datasets:- 1. gold :- daily morning gold prices in US dollars. 2. woolyrnq :- Quarterly production of woollen yarn in Australia: tonnes. 3. gas :- Australian monthly gas production.
Now we will explore each dataset seperately.
autoplot(gold) +
ggtitle("Daily Morning gold prices in US dollars") +
xlab("Day") +
ylab("Prices in $")
Observations:- 1. There are some missing observations in the data. 2. No strong evidence of seasonality. 3. There seems to be a increasing trend but in the later year it is downward. This is probably because we are looking at a part of a cyclic variation. 4. One outlier in the data.
we will find out the frequency of the dataset using frequency function and the outlier in the data.
frequency(gold)
[1] 1
which.max(gold)
[1] 770
Frequecy is 1 as the data is on daily level. And the outlier is on 770th day from 1st Jan 1985.
autoplot(woolyrnq) +
ggtitle("Quarterly production of woollen yarn in Australia") +
xlab("Quarter") +
ylab("production in tonnes")
Observation :- 1. There is a downward trend in the data. 2. Cyclic pattern can be seen last for around 5 years. 3. Presence of seasonlity in the later years on the data.
we will find out the frequency of the dataset using frequency function
frequency(woolyrnq)
[1] 4
Frequency is 4 as the data is on quarterly level.
autoplot(gas) +
ggtitle("Australian monthly gas production") +
xlab("Month") +
ylab("Production")
Observation :- 1. Annual seasonality present in the data that increase in size as the level increases. 2. Increasing trend in the data. 3. No cyclic variation.
frequency(gas)
[1] 12
Frequency is 12 as the data is on monthly level.
Now we will exploring a sales dataset from a small company over the period of 1981-2005.
tute1 = read.csv("/Users/ankittyagi/Documents/Time_series_data/tute1 .csv", header = TRUE)
head(tute1)
Data description :- 1. Sales :- Quarterly sales for a small company over the period 1981-2005. 2. AdBudget :- Advertising budget. 3. GDP :- Gross domestic product.
Now we will the data to a time series data and visualize it.
mytimeseries = ts(tute1[,-1], start = 1981, frequency = 4)
autoplot(mytimeseries, facets = TRUE)
We can see that Sales and AdBudget seems to follow the same pattern over the year.
Now we will look at another dataset that represent retail sales in various categories for different Australian states.
retaildata = readxl::read_excel("/Users/ankittyagi/Documents/Time_series_data/retail.xlsx",skip =1)
str(retaildata)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 381 obs. of 190 variables:
$ Series ID: POSIXct, format: "1982-04-01" "1982-05-01" "1982-06-01" "1982-07-01" ...
$ A3349335T: num 303 298 298 308 299 ...
$ A3349627V: num 41.7 43.1 40.3 40.9 42.1 42 46.1 46.5 53.8 43.8 ...
$ A3349338X: num 63.9 64 62.7 65.6 62.6 64.4 66 65.3 77.9 65.1 ...
$ A3349398A: num 409 405 401 414 404 ...
$ A3349468W: num 65.8 65.8 62.3 68.2 66 62.3 66.2 68.9 90.8 58 ...
$ A3349336V: num 91.8 102.6 105 106 96.9 ...
$ A3349337W: num 53.6 55.4 48.4 52.1 54.2 ...
$ A3349397X: num 211 224 216 226 217 ...
$ A3349399C: num 94 105.7 95.1 95.3 82.8 ...
$ A3349874C: num 32.7 35.6 32.5 33.5 29.4 32.2 31.9 35 51.7 31.4 ...
$ A3349871W: num 127 141 128 129 112 ...
$ A3349790V: num 178 203 176 173 170 ...
$ A3349556W: num 50.4 49.9 48 48.6 51.3 49.6 51.6 55.8 69.9 50.1 ...
$ A3349791W: num 22.2 23.1 22.8 23.2 21.4 21.8 21 23.5 31.4 20.7 ...
$ A3349401C: num 43 45.3 43.7 46.5 44.8 43.9 45.6 45.3 55 47.4 ...
$ A3349873A: num 62.4 63.1 59.6 61.9 60.7 61.2 62.1 68.3 104 63.9 ...
$ A3349872X: num 178 182 174 180 178 ...
$ A3349709X: num 61.8 60.8 58.7 60.3 56.1 58.1 53.9 61.2 75.7 54.2 ...
$ A3349792X: num 85.4 84.8 80.7 82.4 80.7 82.1 87.3 87.4 97.2 93 ...
$ A3349789K: num 147 146 139 143 137 ...
$ A3349555V: num 1250 1300 1234 1265 1218 ...
$ A3349565X: num 258 257 261 266 247 ...
$ A3349414R: num 17.3 18.1 18.1 18.9 19 18.4 20.9 22.4 29.7 22.9 ...
$ A3349799R: num 34.9 34.6 34.6 35.2 33.8 35.4 38 38.2 43.9 36 ...
$ A3349642T: num 310 310 314 320 300 ...
$ A3349413L: num 58.2 62 53.8 57.9 59.2 57.1 66.9 78.1 87.5 58.8 ...
$ A3349564W: num 55.8 58.4 53.7 56.9 56.7 58.9 59.6 63.2 90.3 55.5 ...
$ A3349416V: num 59.1 59.2 59.8 59.8 62.2 63.6 64.1 82.5 143 64.3 ...
$ A3349643V: num 173 180 167 174 178 ...
$ A3349483V: num 93.6 95.3 85.2 91.6 85.2 ...
$ A3349722T: num 26.3 27.1 24.3 25.6 23.5 24.3 25.8 29 39.8 25 ...
$ A3349727C: num 120 122 110 117 109 ...
$ A3349641R: num 104.2 110.2 96.7 104.6 92.5 ...
$ A3349639C: num 42.2 42.1 38.5 38.9 39.5 41.7 46.2 43.5 57.2 43.7 ...
$ A3349415T: num 15.6 15.8 15.2 15.2 14.5 15.1 16.3 17.5 21.5 15.6 ...
$ A3349349F: num 31.6 31.5 29.6 35.2 34.7 34.2 35.9 38 56.5 34.1 ...
$ A3349563V: num 34.4 34.4 33.5 33.4 33.2 34.5 36.7 40.7 57.3 35.8 ...
$ A3349350R: num 124 124 117 123 122 ...
$ A3349640L: num 36.4 36.2 35.7 34.6 32.5 33.9 37.7 40.3 45.2 36.9 ...
$ A3349566A: num 48.7 48.9 47.1 47.5 49.3 50.7 54.1 57.3 64.1 57.7 ...
$ A3349417W: num 85.1 85.1 82.8 82.1 81.8 ...
$ A3349352V: num 916 931 887 921 883 ...
$ A3349882C: num 139 136 144 150 144 ...
$ A3349561R: num NA NA NA NA NA NA NA NA NA NA ...
$ A3349883F: num NA NA NA NA NA NA NA NA NA NA ...
$ A3349721R: num 162 159 167 173 166 ...
$ A3349478A: num 31.8 32.8 34.9 34.6 32.9 33.7 31.7 33.8 42.6 28.8 ...
$ A3349637X: num 46.6 49.6 51.4 50.9 51.6 49.6 49.1 53.2 79 50.1 ...
$ A3349479C: num 13.3 12.7 12.9 13.9 12.8 14.5 13.1 14.9 29.4 14.1 ...
$ A3349797K: num 91.6 95 99.2 99.4 97.3 ...
$ A3349477X: num 28.9 30.6 30.5 27.9 27.4 29.1 33.4 35.5 48.8 29.7 ...
$ A3349719C: num 13.9 14.7 14.5 15.2 14.1 15.5 15.2 15.9 22.1 14.9 ...
$ A3349884J: num 42.8 45.3 45.1 43.1 41.5 44.5 48.6 51.4 70.9 44.6 ...
$ A3349562T: num 67.5 69.7 60.7 67.9 66.5 ...
$ A3349348C: num 18.4 17.7 17.7 18.4 17.8 18.8 20.2 21.5 30.9 22.8 ...
$ A3349480L: num 11.1 11.7 11.5 13.1 13 13 12 13.2 16.2 12 ...
$ A3349476W: num 22 21.9 22.7 24.3 23.6 21.8 19.3 19.2 23.8 17.7 ...
$ A3349881A: num 25.8 25.9 25.9 28.7 27.7 29 27 29.7 41.5 27.8 ...
$ A3349410F: num 77.3 77.2 77.7 84.4 82.1 ...
$ A3349481R: num 18.7 19.5 18.6 22.6 22.6 23.2 20.8 22.7 24.5 20.5 ...
$ A3349718A: num 26.7 27.3 26.2 25.2 25.6 26.7 28.1 27.6 31.1 30.7 ...
$ A3349411J: num 45.4 46.8 44.8 47.8 48.2 49.8 48.8 50.4 55.7 51.2 ...
$ A3349638A: num 486 493 494 516 501 ...
$ A3349654A: num 83.5 80.6 82.3 88.2 82.3 84.2 88.9 87 99.1 82.7 ...
$ A3349499L: num 6 5.4 5.2 5.6 5.7 5.8 6.6 6.5 8.6 7.1 ...
$ A3349902A: num 11.3 11.1 11.2 12.1 11.7 12 12.7 12.2 14.5 12.5 ...
$ A3349432V: num 100.8 97.1 98.7 105.9 99.7 ...
$ A3349656F: num 15.2 17.2 17.4 18.7 18.6 18.8 18.7 21 23.8 19.7 ...
$ A3349361W: num 16 19 18.1 20.3 19.6 19.9 19.7 22.7 30.3 18.8 ...
$ A3349501L: num 8.6 9.5 8.4 10.3 10.6 11.5 10.8 13.1 25.4 9.2 ...
$ A3349503T: num 39.7 45.7 43.9 49.3 48.9 50.2 49.3 56.8 79.6 47.7 ...
$ A3349360V: num 19.1 21.6 18.3 18.6 17.1 18.2 20.7 23.6 33.4 20 ...
$ A3349903C: num 6.6 7 6 6.4 6 6.4 7.4 8 11.7 6.4 ...
$ A3349905J: num 25.7 28.6 24.3 25 23.1 24.6 28.1 31.6 45.1 26.4 ...
$ A3349658K: num 48.9 52.2 48.9 48.3 49.4 48.5 46.1 58.5 88.9 43.5 ...
$ A3349575C: num 8.1 7.5 6.7 7.8 7.9 7.8 7.6 8.8 12.9 8 ...
$ A3349428C: num 6.1 6.5 6.1 6.6 6.3 6.4 7.4 7.8 10.5 6.7 ...
$ A3349500K: num 7.2 7.5 7.5 7.9 8.3 7.8 8.4 8.8 11.1 8.1 ...
$ A3349577J: num 12.9 13 12.5 13.9 13.7 14.1 15 15.8 23.1 13.9 ...
$ A3349433W: num 34.2 34.4 32.7 36.2 36.1 36 38.4 41.2 57.6 36.6 ...
$ A3349576F: num 14.3 14.2 13.4 14.5 13.6 13.9 17.2 17.3 22.8 15.3 ...
$ A3349574A: num 15.8 15.8 15.3 17 17.5 17.8 20.6 20.9 24.8 24.2 ...
$ A3349816F: num 30.1 30 28.7 31.4 31.1 31.7 37.8 38.2 47.6 39.5 ...
$ A3349815C: num 279 288 277 296 288 ...
$ A3349744F: num 96.6 96.4 95.6 103.3 96.6 ...
$ A3349823C: num 12.3 11.8 11.3 12.1 12 12.3 14.2 14.2 16.2 15.7 ...
$ A3349508C: num 13.1 13.4 13.5 13.8 13.3 13.4 14.1 13.8 16 12.1 ...
$ A3349742A: num 122 122 120 129 122 ...
$ A3349661X: num 19.2 21.9 19.9 19.3 19.6 19.9 18 19 23 16.6 ...
$ A3349660W: num 22.5 27.8 26.7 28.2 27.4 27 25.5 27.4 37.6 25.8 ...
$ A3349909T: num 8.6 8.2 7.9 8.7 7.9 8.7 10.2 13.2 26.6 9.6 ...
$ A3349824F: num 50.4 57.9 54.4 56.2 55 55.6 53.6 59.6 87.2 52 ...
$ A3349507A: num 21.4 24.1 21.4 21.8 18.7 19.5 20.8 23.8 34.8 18.8 ...
$ A3349580W: num 7.4 8 7 7.2 6.6 7.4 8.3 8.8 13.1 7.2 ...
$ A3349825J: num 28.8 32.1 28.5 29 25.3 26.9 29.1 32.6 47.9 26 ...
$ A3349434X: num 36.5 43.7 38 42 38.5 40.2 37.4 42.4 71.9 35.6 ...
$ A3349822A: num 9.7 11 10.7 9 9.1 10 7.7 8.4 11.8 7.4 ...
$ A3349821X: num 6.5 7.2 6.6 7 6.8 7.1 7.5 7.9 11 6.7 ...
[list output truncated]
Now we will explore anyone category in the data.
myts = ts(retaildata[,"A3349873A"], start = c(1984,4), frequency = 12)
autoplot(myts)
observations :- 1. There is an increasing trend in the data. 2. Annual seasonality is present in the data that increases in size as the level increases. 3. Cyclic variation is also present that last for 5-6 years.
Let’s analyse the seasonality more closely.
ggseasonplot(myts, year.labels = TRUE, year.labels.left = TRUE)
ggseasonplot(myts, year.labels = TRUE, year.labels.left = TRUE, polar = TRUE)
ggsubseriesplot(myts)
gglagplot(myts)
A strong positive correlation in lag12 because we are plotting same time last year.
ggAcf(myts,lag.max = 36)
Observation :- 1. Autocorrelation at lag12 and multiple of 12 are higher than other lags that is happening because of seasonality in the data. 2. Also the peaks are getting smaller because of the presence of trend. 3. The “scalloped shape” is because of the seasonality in the data.
Now we will plot some time series.
autoplot(bicoal) +
ggtitle("Annual bituminous coal production") +
xlab("Year")
autoplot(chicken) +
ggtitle("Price of chicken in US") +
xlab("Year") +
ylab("Price (in $)")
autoplot(dole) +
ggtitle("No. of people on unemployment benefits in Australia") +
xlab("Month") +
ylab("Population")
autoplot(usdeaths) +
ggtitle("Monthly accidental deaths in USA") +
xlab("Month") +
ylab("No. of Deaths")
autoplot(lynx) +
ggtitle("Annual lynx trapping in Canada") +
xlab("Year") +
ylab("No. of lynx")
autoplot(goog) +
ggtitle("Closing stock prices of Google Inc") +
xlab("Daily") +
ylab("Prices/Unit")
autoplot(writing) +
ggtitle("Industry Sales for printing and writing paper") +
xlab("Monthly") +
ylab("Sales (in francs)")
autoplot(fancy) +
ggtitle("Monthly sales for a souvenir shop") +
xlab("Month") +
ylab("Sale Units")
autoplot(a10) +
ggtitle("Total monthly scripts for pharmaceutical products under ATC a10") +
xlab("Month") +
ylab("No. of scripts")
autoplot(h02) +
ggtitle("Total monthly scripts for pharmaceutical products under ATC h02") +
xlab("Month") +
ylab("No. of scripts")
Now we will explore the seasonal pattern in some of the datasets.
ggseasonplot(writing, year.labels = TRUE, year.labels.left = TRUE) + ggtitle("Industry Sales for printing and writing paper")
ggsubseriesplot(writing) + ggtitle("Industry Sales for printing and writing paper")
ggseasonplot(fancy, year.labels = TRUE, year.labels.left = TRUE) + ggtitle("Monthly sales for a souvenir shop")
ggsubseriesplot(fancy) + ggtitle("Monthly sales for a souvenir shop")
ggseasonplot(a10, year.labels = TRUE, year.labels.left = TRUE) + ggtitle("Total monthly scripts for pharmaceutical products under ATC a10")
ggsubseriesplot(a10) + ggtitle("Total monthly scripts for pharmaceutical products under ATC a10")
NA
ggseasonplot(h02, year.labels = TRUE, year.labels.left = TRUE) + ggtitle("Total monthly scripts for pharmaceutical products under ATC h02")
ggsubseriesplot(h02) + ggtitle("Total monthly scripts for pharmaceutical products under ATC h02")
Now lets explore some other datasets
autoplot(hsales) +
ggtitle("Monthly sales of new one-family houses sold in the USA") +
xlab("Year") +
ylab("sales(in millions)")
ggseasonplot(hsales, year.labels = TRUE, year.labels.left = TRUE) +
ggtitle("Monthly sales of new one-family houses sold in the USA")
ggsubseriesplot(hsales) +
ggtitle("Monthly sales of new one-family houses sold in the USA")
gglagplot(hsales, lags = 24)
ggAcf(hsales, lag.max = 84)
Observations :- 1. Timeplot shows that there is no trend present in the data. 2. Seasonal pattern present in the data but a closer look at subseries plot will give us a clear picture. 3. Cyclic variation is present in the data that last from 7-9 years. 4. Year 1980, 1982 and 1986 shows a significant different pattern than other years in most of the seasons. 5. Year 1991 also some opposite patterns in some of the month. 6. Subseries shows that in intial years if sales is high current year than it will go down next year for almost years. However the amount of decrease or increase changes decrease with time. 7. The ACF represents the 6 point. We have peak at lag 12. The scalloped shape in the 1st year shows the presence of seasonality. But the after that we see more negative correlation that represents the drops sales in after every alternative year.
autoplot(usdeaths) +
ggtitle("Monthly accidental deaths in USA") +
xlab("Year") +
ylab("No. of Deaths")
ggseasonplot(usdeaths, year.labels = TRUE, year.labels.left = TRUE) +
ggtitle("Monthly accidental deaths in USA")
ggsubseriesplot(usdeaths) +
ggtitle("Monthly accidental deaths in USA")
gglagplot(usdeaths, lags = 24)
ggAcf(usdeaths, lag.max = 24)
autoplot(bricksq) +
ggtitle("Quarterly Brick Production in Australia") +
xlab("Quarter") +
ylab("Production")
ggseasonplot(bricksq, year.labels = TRUE, year.labels.left = TRUE) +
ggtitle("Quarterly Brick Production in Australia")
ggsubseriesplot(bricksq) +
ggtitle("Quarterly Brick Production in Australia")
gglagplot(bricksq, lags = 24)
ggAcf(bricksq, lag.max = 24)
autoplot(sunspotarea) +
ggtitle("Annual Average sunspot area") +
xlab("year")
gglagplot(sunspotarea, lags = 24)
ggAcf(sunspotarea, lag.max = 24)
autoplot(gasoline) +
ggtitle("Gasoline product supply") +
xlab("Week") +
ylab("Million barrels per day")
ggseasonplot(gasoline, year.labels = TRUE, year.labels.left = TRUE) +
ggtitle("Gasoline product supply")
ggAcf(gasoline, lag.max = 72)
Now we will look at International arrivals datasets from 4 countries.
autoplot(arrivals[,c("Japan")]) +
ggtitle("Quarterly international arrivals for Japan") +
ylab("arrivals")
autoplot(arrivals[,c("NZ")]) +
ggtitle("Quarterly international arrivals for NZ") +
ylab("arrivals")
autoplot(arrivals[,c("UK")]) +
ggtitle("Quarterly international arrivals for UK") +
ylab("arrivals")
autoplot(arrivals[,c("US")]) +
ggtitle("Quarterly international arrivals for US") +
ylab("arrivals")
Japan :- 1. An increasing trend at first and then a decreasing trend which could be a part of a cyclic variation.
NZ :- 1. An increasing trend is present. 2. A sesonal pattern is also present that increase with time.
UK :- 1. An increasing trend is present. 2. A sesonal pattern is also present that increase with time.
US :- 1. An increasing trend is present with cyclic variation that last for about 12-15 years. 2. A sesonal pattern.
ggseasonplot(arrivals[,c("Japan")], year.labels = TRUE, year.labels.left = TRUE) +
ggtitle("International Arrivals in Japan")
ggseasonplot(arrivals[,c("NZ")], year.labels = TRUE, year.labels.left = TRUE) +
ggtitle("International Arrivals in NZ")
ggseasonplot(arrivals[,c("US")], year.labels = TRUE, year.labels.left = TRUE) +
ggtitle("International Arrivals in US")
ggseasonplot(arrivals[,c("UK")], year.labels = TRUE, year.labels.left = TRUE) +
ggtitle("International Arrivals in UK")
Japan :- 1. Seasonal pattern for Q3 is opposite for year till 1988 and 1989-. 2. Also the slope of increase and decrease are more sharp in later years. 3. There are some years that have a different trend in year Quarter 4 as well.
NZ :- 1. Year 2003, 2011 shows an opposite trend in last quarter.
US :- 1. Year 2000, 2009, 1988, 1991, 2010 have a different pattern in 3rd quarter. 2. Year 1990, 1983, 2001, 2000 have an opposite trend in 4th quarter.
UK :- 1. Quarter 2-3 have opposite trend in some years.
ggsubseriesplot(arrivals[,c("Japan")]) +
ggtitle("International Arrivals in Japan") +
ylab("arrivals")
ggsubseriesplot(arrivals[,c("NZ")]) +
ggtitle("International Arrivals in NZ") +
ylab("arrivals")
ggsubseriesplot(arrivals[,c("US")]) +
ggtitle("International Arrivals in US") +
ylab("arrivals")
ggsubseriesplot(arrivals[,c("UK")]) +
ggtitle("International Arrivals in UK") +
ylab("arrivals")
Japan :- 1. An increasing trend at first and then a decreasing trend which could be a part of a cyclic variation in all months. 2. There is a dip in one year expecially in quarter 2 and 4.
NZ :- NZ do not see any significantly unusual pattern.
US :- Q3 seems to have 2 peaks in initial years that are not in other quarters.
UK :- All years have similar pattern but the increasing trend in middle quarter is much smaller than other 2 quarters.
Monthly total number of pigs slaughtered in Victoria, Australia dataset
mypigs = window(pigs, start = 1990)
autoplot(mypigs) +
ggtitle("Monthly total number of pigs slaughtered")
ggAcf(mypigs)
Now we will explore Dow Jones Index data.
ddj = diff(dj)
autoplot(ddj) +
ggtitle("Change in Dow Jones Index")
ggAcf(ddj)
Changes in Dow Jones data looks like a white noise.